Measuring Feature Diversity in Native Language Identification

نویسندگان

  • Shervin Malmasi
  • Aoife Cahill
چکیده

The task of Native Language Identification (NLI) is typically solved with machine learning methods, and systems make use of a wide variety of features. Some preliminary studies have been conducted to examine the effectiveness of individual features, however, no systematic study of feature interaction has been carried out. We propose a function to measure feature independence and analyze its effectiveness on a standard NLI corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using N-gram and Word Network Features for Native Language Identification

We report on the performance of two different feature sets in the Native Language Identification Shared Task (Tetreault et al., 2013). Our feature sets were inspired by existing literature on native language identification and word networks. Experiments show that word networks have competitive performance against the baseline feature set, which is a promising result. We also present a discussio...

متن کامل

From Language to Family and Back: Native Language and Language Family Identification from English Text

Revealing an anonymous author’s traits from text is a well-researched area. In this paper we aim to identify the native language and language family of a non-native English author, given his/her English writings. We extract features from the text based on prior work, and extend or modify it to construct different feature sets, and use support vector machines for classification. We show that nat...

متن کامل

SeerNet@INLI-FIRE-2017: Hierarchical Ensemble for Indian Native Language Identification

Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchi...

متن کامل

Exploiting Parse Structures for Native Language Identification

Attempts to profile authors according to their characteristics extracted from textual data, including native language, have drawn attention in recent years, via various machine learning approaches utilising mostly lexical features. Drawing on the idea of contrastive analysis, which postulates that syntactic errors in a text are to some extent influenced by the native language of an author, this...

متن کامل

Bharathi SSN @ INLI-FIRE-2017: SVM based approach for Indian Native Language Identification

Native Language Identification (NLI) is the task of identifying the native language of a writer or a speaker by analyzing their text. NLI can be important for a number of applications. In forensic linguistics, native language is often used as an important feature for authorship profiling and identification. Nowadays due to the huge usage of social media sites and online interactions, receiving ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015